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Using computer simulations of over 100,000 atoms, the mechanism for the hydrophobic collapse of 
an idealized hydrated chain is obtained. This is done by coarse- graining the atomistic water molecule 
positions over 129,000 collective variables that represent the water density field and then using the 
string method in these variables to compute the minimum free energy pathway (MFEP) for the 
collapsing chain. The dynamical relevance of the MFEP (i.e. its coincidence with the mechanism 
of collapse) is validated a posteriori using conventional molecular dynamics trajectories. Analysis 
of the MFEP provides atomistic confirmation for the mechanism of hydrophobic collapse proposed 
by ten Wolde and Chandler. In particular, it is shown that lengthscale-dependent hydrophobic 
dewetting is the rate-limiting step in the hydrophobic collapse of the considered chain. 



I. INTRODUCTION 

This paper applies the string methodii^J^ to the phe- 
nomenon of hydrophobic collapse. It is shown that the 
method can describe complex dynamics in large atom- 
istic systems, ones for which other currently available 
rare event methods would seem intractable. Further- 
more, the string method is used to demonstrate that 
atomistic dynamics can be usefully projected onto that 
of a coarse-grained field. The specific application of the 
string method considered herein finds results that are 
consistent with the mechanism of hydrophobic collapse 
put forward by ten Wolde and Chandler 

The hydrophobic effect - or the tendency of oil and wa- 
ter to separate on mesoscopic lengthscales - has long been 
recognized as an important driving force in molecular 
assembly.^ It stabilizes the formation of micelles, protein 
tertiary structure, multi-protein assemblies, and cellu- 
lar membranes. Recent theoretical developments have 
helped establish a quantitative understanding of the ther- 
modynamics of hydrophobicity,^^^ but the dynamics of 
hydrophobic collapse remain poorly understood because 
it couples a large range of length- and timescales. Rel- 
evant processes include the atomic-scale motions of in- 
dividual water molecules, collective solvent density fluc- 
tuations, and the nanometer-scale movements of the hy- 
drophobic solutes. Bridging these dynamical hierarchies - 
and addressing the problem of complex dynamics in large 
systems - is a fundamental challenge for computational 
methods. 

We address this challenge using the string method in 
collective variables. ^^^^^ We consider the collapse of a 
chain composed of twelve spherical hydrophobes in an 
explicit solvent of approximately 34,000 water molecules. 
The system is studied by coarse-graining the water 
molecule positions onto a set of 129,000 collective vari- 
ables that represent the solvent density field and then us- 
ing the string method in these variables to compute the 
minimum free energy path (MFEP) for the hydrophobic 
collapse of the chain. Conventional molecular dynamics 



(MD) simulations are subsequently performed to confirm 
that this coarse-grained description adequately describes 
the mechanism of hydrophobic collapse. 

ten Wolde and Chandler— have previously reported 
simulations of a non-atomistic model for the hydrated 
chain considered herein. It was found that the key step 
in the collapse dynamics is a collective solvent density 
fluctuation that is nucleated at the hydrophobic surface 
of the chain. However, it was not clear whether this 
proposed mechanism captured the essential features of 
hydrophobic collapse or whether it was an artifact of the 
model. Atomistic simulations were needed to resolve the 
issue. 

Previous efforts to characterize the mechanism of hy- 
drophobic collapse using atomistic computer simulations 
neither confirm nor disprove the mechanism proposed 
by ten Wolde and Chandler. In recent work, for exam- 
ple, molecular dynamics (MD) simulations showed that 
dewetting accompanies the collapse of hydrophobes in 
water jiSiiiiii^ii^ However, the rate-limiting step - and thus 
the mechanism for hydrophobic collapse - was not charac- 
terized. Specifically, in Ref. 12, MD trajectories were ini- 
tiated at various separation distances for a pair of melet- 
tin protein dimers. Observation of the collapse dynamics 
was observed only when the initial configuration was on 
the "near" side of the free energy barrier, but the actual 
nature of that barrier - and the dynamics of crossing it 
- were not studied. In Ref. [l^, the thermodynamics and 
solvation of a hydrophobic chain was studied as a func- 
tion of its radius of gyration, but again, the dynamics of 
collapse was not characterized. 

The results presented here are the first atomistic simu- 
lations to explicitly confirm the mechanism of hydropho- 
bic collapse put forward by ten Wolde and Chandler.^ 
In particular, we show that the rate-limiting step in 
hydrophobic collapse coincides with a collective solvent 
motion and that performing the rate-limiting step in- 
volves performing work almost exclusively in the sol- 
vent coordinates. We further show that the solvation 
free energy along the MFEP can be decomposed into 
small- and large- lengthscale contributions. This analy- 
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sis demonstrates that atomistic solvent energetics can be 
quantitatively modeled using a grid-based solvent density 
field, and it suggests that the rate-limiting step for hy- 
drophobic collapse coincides with lengthscale-dependent 
hydrophobic dewetting. 

Combined, the results presented in this study provide 
(1) evidence that the string method is a powerful tool 
for atomistically simulating complex dynamics in large 
systems, (2) a proof of principle that atomistic solvent 
dynamics can be usefully projected onto that of a coarse- 
grained field, and (3) an atomistic demonstration that 
dewetting is key to the dynamics of hydrophobic collapse. 



II. ATOMISTIC MODEL, COARSE-GRAINING, 
AND THE STRING METHOD 

A. The system 

We consider the atomistic version of the hydrated hy- 
drophobic chain studied by ten Wolde and Chandler 
The chain is composed of twelve spherical hydrophobes, 
each of diameter 7.2 A and mass 70 amu. These are 
ideal hydrophobes, as they exert purely repulsive inter- 
actions upon the oxygen atoms of the water molecules. 
Consecutive hydrophobes in the unbranched chain inter- 
act via harmonic bonds, and the chain is made semi-rigid 
by a potential energy term that penalizes its curvature. 
The chain is hydrated with approximately 34000 rigid 
water molecules in an orthorhombic simulation box with 
periodic boundary conditions. All simulations were per- 
formed at 300 K. Full details of the system are provided 
in the Supporting Information Sec. A. 

In describing hydrophobic collapse, it is necessary to 
choose an appropriate thermodynamic ensemble for the 
simulations. Nanometer-scale fluctuations in solvent den- 
sity are suspected to play a key role in these dynam- 
ics. Use of the NVT ensemble (with a 1 g/cm^ density 
of water) might suppress these density fluctuations and 
bias the calculated mechanism. We avoid this problem 
with a simple technique that is based on the fact that 
under ambient conditions, liquid water is very close to 
phase coexistence. Indeed, it is this proximity that leads 
to the possibility of large lengthscale hydrophobicity^^ii^ 
By placing a fixed number of water molecules at 300 
K in a volume that corresponds to an average density 
of less than 1 g/cm^, we obtain a fraction of the sys- 
tem at the density of water vapor and the majority at a 
density of bulk water. Since we are not concerned with 
solvent fluctuations on macroscopic lengthscales, the dif- 
ference between simulating bulk water at its own vapor 
pressure compared to atmospheric pressure is completely 
negligible. This strategy has been previously employed 
to study the dewetting transition between solvophobic 
surfaces. ^^^^^ To ensure that the liquid- vapor interface 
remains both flat and well-distanced from the location of 
the chain, we repel particles from a thin layer at the top 
edge of the simulation box, as is discussed in Supporting 



Information Sec. A. 



B. Coarse-graining 

Throughout this study, we simulate the hydrated chain 
using atomistic MD, but we employ the string method us- 
ing collective variables that describe the solvent density 
field. A coarse-graining algorithm is developed to con- 
nect these atomistic and collective variable representa- 
tions of the solvent. Following ten Wolde and Chandler,^' 
the simulation box is partitioned into a three-dimensional 
lattice (48x48x56) of cubic cells with sidelength I = 2.1 
A. We label the cells with the vector k = (/Cx, /Cy, /Cz), 
where each /c^ takes on integer values bounded as fol- 
lows: 1 < fcx < 48, 1 < /Cy < 48, and 1 < h < 56. On 
this grid, the molecular density, p(r), is coarse grained 
into the field Pk, where 



J dr p{r) 



(1) 



C=x,y,z 



Here, the integral extends over the volume of the system, 
1^ denotes the unit vector in the (^^ Cartesian direction, 
and the coarse graining function, ^/c(x), must be normal- 
ized, i.e., 1 = XI /c ^k{x). The particular function that we 
have chosen to use is 



dy ^{x - y) 
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where 



x) = (2^G^)-^I^ exp(-xV2a2), 



(3) 



(J = lA, and /i/c(x) is unity when x is in the k^^ interval 
and zero otherwise. Since <l>/e(x) is normalized, Xlk^k = 
N is the total number of water molecules. In effect, this 
choice spreads the atomistic density field p(r) over the 
lengthscale <j and bins it into a grid of lenthscale / in such 
a way as to preserve normalization. While we have found 
this choice of coarse graining function to be convenient, 
others are possible. 

Figs. [T]A. and B illustrate the coarse-graining proce- 
dure. In part A, the solvent is schematically shown before 
and after coarse-graining. In part B, the same mapping is 
shown for the actual system considered herein. Cells are 
shaded white when their solvent occupation, Pk, is less 
than half of its bulk average value of (P)buik = c = 0.3 
molecules. Small local density fluctuations are seen 
throughout the simulation box, as is expected for an in- 
stantaneous solvent configuration. 

In addition to visualizing solvent density, the coarse- 
graining algorithm is useful for controlling the solvent 
density in MD simulations. For example, if it is desired 
that a particular cell k exhibit a solvent occupation PjJ, 
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FIG. 1: (A) The coarse- graining procedure is schematically 
shown to project the atomistic solvent density onto a discrete 
grid. (B) The same procedure is shown for an instantaneous 
solvent configuration of the actual system. Grid cells con- 
taining less solvent density than c/2 are colored white; the 
remaining cells are left transparent against the blue back- 
ground. The hydrophobic chain is shown in red. (C) The 
potential in Eq. |4]is used to restrain atomistic solvent den- 
sity to the reference distribution at the far left. With larger k,, 
reported numerically in units of 2kBT/c^, the average atomic 
solvent density reproduces the reference distribution in detail 
(see text for notation). 



the following potential energy term can be used to derive 
the appropriate forces on the atoms in the simulation, 



1 



(4) 



where hi is an appropriate force constant. This technique 
is illustrated in Fig. [T]G. The left-most panel shows a den- 
sity distribution that is exceedingly unlikely for a simu- 
lation of ambient liquid water. Panels to the right show 
the average solvent density distribution from MD simula- 
tions that are restrained to this unlikely reference distri- 
bution with increasing k,. Force constant values greater 
than 2kBT/c^^ in which the simulation incurs an ener- 
getic penalty of at least ksT for placing the average bulk 
density in a cell that is restrained to be empty, ensure 
that the reference distribution is recovered in detail. 



C. String method in collective variables 

We use the string method in collective variables to cal- 
culate the minimum free energy path (MFEP) for the 



collapse of the hydrated chain^^i^i^ To explain, we intro- 
duce some notation. Let x = (xc,w) be the position 
vector of length n = 3x 12 + 3x3xA/'for the atomistic 
representation of the entire system, where Xc is the po- 
sition vector for the atoms in the chain, and w is the 
position vector for the atoms in the water molecules. 
Similarly, let z(x) = (xc,P) be the vector of length 
M = 3 X 12 + 48 X 48 X 56 for the collective variable 
representation of the system, where the elements of P 
are defined in Eq. [TJ 

The MFEP, z*(a), is parameterized by the string co- 
ordinate a, where a = corresponds to the collapsed 
chain and a = 1 corresponds to the extended chain. It 
obeys the condition 



dz*{a) 
da 



parallel to 



dF{7.*{a)) 



where 



F(z) = -r^\n{Tl5{zi-Zi{^)) 



(5) 
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is the free energy surface defined in the collective vari- 
ables, and 



M,,(z) = exp[/?F(z) 



(7) 
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Here, angle brackets indicate equilibrium expectation val- 
ues, [3 = {kBT)~^ is the reciprocal temperature, and rrik 
is the mass of the atom corresponding to coordinate Xk. 

If the employed collective variables are adequate to de- 
scribe the mechanism of the reaction (here, the hydropho- 
bic collapse), then it can be shown that the MFEP is the 
path of maximum likelihood for reactive MD trajectories 
that are monitored in the collective variables.^ In the cur- 
rent application, we shall check the adequacy of the col- 
lective variables a posteriori by running MD trajectories 
that are initiated from the presumed rate-limiting step 
along the MFEP (i.e. the configuration of maximum free 
energy) and verifying that these trajectories lead with ap- 
proximately equal probability to either the collapsed or 
extended configurations of the chain (see Section [III Cp . 

The string method yields the MFEP by evolving a 
parameterized curve (i.e. a string) according to the 
dynamics^^^^ 



dz*{a,t) 

m 



\-M fr^^fr. ,..SF(Z*(M)) 



+ A(a,t) 



dz^{a,t) 
da 



(8) 



where the term X{a^t)dz^ {a^t)/da enforces the con- 
straint that the string remain parameterized by normal- 
ized arc- length. The endpoints of the string evolve by 
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steepest decent on the free energy surface, 
dz*{a,t) dFiz*ia,t)) 



dt 



dzi 



(9; 



for a = and a = 1. These artificial dynamics of th( 
string yield the MFEP, which satisfies Eq. O 

In practice, the string is discretized using N^^ config- 
urations of the system in the collective variable repre 
sentation. The dynamics in Eqs. [8] and [9] are then ac- 
complished in a three-step cycle where (i) the endpoim 
configurations of the string are evolved according to Eq 
[9] and the rest of the configurations are evolved accord 
ing to the first term in Eq. [HI (ii) the string is (option- 
ally) smoothed, and (iii) the string is reparameterizec 
to maintain equidistance of the configurations in the dis- 
cretization. Step (i) requires evaluation of the mean force 
elements dF(7?)/dzi and the tensor elements Mij{z) ai 
each configuration. These terms are obtained using re- 
strained atomistic MD simulations of the sort illustratec 
for the solvent degrees of freedom in Fig. [T]G. The de- 
tails of the string calculation are provided in Supporting 
Information Sec. B. 



III. HYDROPHOBIC COLLAPSE OF A 
HYDRATED CHAIN 

A. Minimum free energy path 

Fig. [2] shows the MFEP for the hydrophobic collapse 
of the hydrated chain. It is obtained using the string 
method in the collective variables for the chain atom 
positions and the grid-based solvent density field. The 
string is discretized using A^d = 40 configurations of the 
system, and it is evolved using the steepest descent dy- 
namics in Eqs. [8] and [9] to yield the MFEP that satisfies 
Eq. [5l The free energy profile is obtained by integrating 
the projection of the mean force along the MFEP, using 



F*{a) 



r VF(z*(a')) ■dz*(a')- 
Jo 



(10) 




Upon discretization, s = NdXa is the configuration num- 
ber that indexes the MFEP. The resolution of F^{a) in 
Fig. [2] could be improved by employing a larger A^d, but 
at larger computational cost. 

To the extent that the collective variables employed 
in this study adequately describe the mechanism of hy- 
drophobic collapse, the variable s parameterizes the re- 
action coordinate for the collapse dynamics; this assump- 
tion is checked below with the use of straightforward MD 
simulations. The statistical errors in the free energy pro- 
file between consecutive configurations are approximately 
the size of the plotted circles, and the small features in 
the profile at configurations 27 and 31 are due to noise 
in the convergence of the string calculation. The lower 
panel of Fig. [2] presents configurations along the MFEP 
in the region of the free energy barrier. As in Fig. [H 
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FIG. 2: The minimum free energy path obtained using the 
string method. Above, the free energy profile exhibits a single 
peak at configuration 22. Below, the configurations of the 
path in the vicinity of the free energy peak are shown with 
configuration numbers indicated in white text. 



lattice cells with less than half of the bulk solvent occu- 
pation number fade to white. 

The free energy profile in Fig. [2] is dominated by a 
single barrier at configuration 22, where a liquid- vapor 
interface is formed at a bend in the hydrophobic chain. 
The sharply curved chain geometry presents an extended 
hydrophobic surface to molecules located in the crook of 
the bend, an environment that is analogous to that ex- 
perienced by water trapped between hydrophobic plates 
and known to stabilize large-lengthscale solvent density 
fluctuations ^^i^i^ The barrier in the calculated free en- 
ergy profile clearly coincides with a collective motion in 
the solvent variables. 



B. The solvation free energy 

A simple theory can be constructed to understand the 
contributions to the free energy profile in Fig. [2] and to 
test the validity of coarse-graining solvent interactions. 
Noting that our choice of collective variables for the 
string calculation neglects the configurational entropy of 
the chain, the free energy profile F can be decomposed 
into the configurational potential energy of the chain Ec 
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and the free energy of hydrating the chain Ff^, 
F*{a)=E^{a)+F^{a). 



(11) 



The term E^^a) is easily evaluated from the chain po- 
tential energy term, yielding the components of the free 
energy profile shown in Fig. [3]A.. 

We focus our attention on Fh. The solvation free en- 
ergy for hydrophobes of idealized geometry is well under- 
stood. For small- lengthscale hydrophobes (< 1 nm), this 
energy scales linearly with the solute volume, whereas for 
large-lengthscale hydrophobes, it scales linearly with the 
solute surface area J^i^S But theoretical prediction of the 
solvation free energy for more complicated solute geome- 
tries is not necessarily trivial. For example, depending on 
its configuration, some parts of the chain considered here 
might be solvated like a large-lengthscale hydrophobe, 
whereas other parts might behave like a small-lengthscale 
hydrophobe. 

We have developed a technique for parsing a general 
hydrophobic solute into components that belong to ei- 
ther the small-lengthscale or large-lengthscale regimes. 
We define the solvent-depleted volume as the continu- 
ous set of lattice cells that are, on average, occupied by 
less than 50% of the average bulk solvent value (this set 
includes both the volume that is excluded by the hard- 
sphere-like interactions between the solute and solvent 
interactions, as well as any additional volume in the vicin- 
ity of the solute that is not substantially occupied by the 
solvent). We also introduce a probe volume that is a cu- 
bic 4 X 4 X 4 set of cells, which is approximately the size 
at which the water solvation structure changes from the 
small-lengthscale to the large-lengthscale regime. 

The probe volume is used to determine whether a given 
section of the solvent-depleted volume is in the small- or 
large-lengthscale regime. This is done by moving the 
probe volume throughout the simulation box and deter- 
mining whether it can be fit into different portions of the 
solvent-depleted volume. If, at a given position, the vast 
majority of this probe volume (specifically 59 out of 64 
cells) fits within the solvent-depleted volume, then the 
cells in that portion of the solvent-depleted volume are 
included in the large-lengthscale component, Koi- Por- 
tions of the solvent-depleted volume that do not meet 
this criterion for any position of the probe volume are 
included in the small-lengthscale component, Kxt- The 
volumes Koi and 14xt, which are plotted in Fig. [3p, 
primarily include contributions from the collapsed por- 
tions and extended portions of the chain, respectively. 
The total volume Vtot is obtained by counting the total 
number of cells in the solvent-depleted volume, such that 

The total surface area, ^tot, and its large-lengthscale 
component, Acoi, were obtained by counting the number 
of external faces on the cells that comprise Vtot and Koi, 
respectively. The surface area of the small-lengthscale 
component was obtained using Aext = ^tot — ^coi- To 
account for the fact that we are calculating the area of a 
smooth surface by projecting it onto a cubic lattice, each 




5 10 15 20 25 30 35 40 

B 



g 600 




5 10 15 20 25 30 35 40 




5 10 15 20 25 30 35 40 



120 


D 




P 

^ 80 






O 40 






LU 














5 10 15 20 25 30 35 40 

Reaction Coordinate Index 



FIG. 3: A simple theory describes the solvation free energy of 
irregular hydrophobic solute geometries. (A) The calculated 
free energy profile F* is decomposed into contributions from 
the chain configurational potential energy Ec and the free en- 
ergy of solvation F^. (B) The volume of the hydrated chain 
is parsed into large-lengthscale (Vcoi) and small-lengthscale 
(Vext) components. (C) The surface area of the chain is simi- 
larly parsed. (D) The dotted line indicates the solvation free 
energy obtained using a relation (Eq. [12]) that is based on the 
volume- scaling of the small-lengthscale solvation free energy 
and the surface area-scaling of the large-lengthscale solvation 
free energy. The solvation free energy obtained from simula- 
tion (solid line) is included for comparison. 
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surface area term was also multiplied by a 2/3, a cor- 
rection factor that is exact for an infinitely large sphere. 
The surface area components are plotted in Fig. [3lG. 

We use the parsed components of the solute volume 
and surface area to estimate the solvation free energy for 
the chain along the minimum free energy path with the 
relationship 

Fh(a) = XVeM + 7^coi(c^). (12) 

The coefficients A and 7 are, respectively, the prefac- 
tors for the linear scaling of the solvation free energy 
in the small- lengthscale and large- lengthscale regimes, 
as obtained by calculations on a spherical hydrophobic 
solute. Their values are taken to be A = 8 mJ/(m^A) 
and 7 = 45 mJ/m^. 

The results of this calculation are presented in Fig. 
[3p. The dotted line indicates our theoretical estimate of 
the solvation free energy using Eq. [121 and the solid line 
indicates the data from the atomistic simulations (from 
Fig. [3]A.). The agreement is very good, and as is shown 
in Supporting Information Sec. C, it is better than can 
be obtained from solvation free energy estimates that ex- 
clusively consider either the solute volume or the solute 
surface area. This result demonstrates that atomistic 
solvent energetics can be quantitatively modeled using a 
grid-based solvent density field. Furthermore, we note 
in Fig. [3p and C that the collapsing chain first devel- 
ops large-lengthscale components in the vicinity of the 
peak in the free energy profile, which suggests that this 
peak coincides with the crossover from small- to large- 
lengthscale solvation (i.e. dewetting). This observation 
is consistent with the collective solvent density motion 
that is observed near the free energy barrier in Fig. [2l 



C. The committor function and a proof of principle 
for coarse graining 

The various assumptions that are employed in our im- 
plementation of the string method, including our choice 
of collective variables and the corresponding neglect of 
the chain configurational entropy, raise the possibility 
that the barrier in the free energy profile in Fig. [2] 
does not correspond to the dynamical bottleneck for hy- 
drophobic collapse. This is a general concern in trying 
to relate free energy calculations to dynamical quanti- 
ties, such as the reaction mechanism and the reaction 
rate. To confirm that the calculated free energy profile 
is dynamically relevant, we evaluate the committor func- 
tion at several configurations along the MFEP. The com- 
mittor function reports the relative probability that MD 
trajectories that are initialized from a particular collec- 
tive variable configuration first proceed to the extended 
configurations of the chain, as opposed to the collapsed 
configurations. For initial collective variable configura- 
tions that coincide with the true dynamical bottleneck, 
the committor function assumes a value of exactly 0.5. 



We first evaluate the committor function at configura- 
tion 22, which corresponds to the peak in the calculated 
free energy profile in Fig. O The committor function is 
obtained by performing straightforward MD trajectories 
from initial conditions that are consistent with this collec- 
tive variable configuration, and then tallying the fraction 
of those trajectories whose endpoints are "extended," as 
opposed to "collapsed". The initial conditions for these 
trajectories are obtained from a 20 ps MD trajectory that 
is restrained to the collective variables for configuration 
22; the atomistic coordinates of the restrained trajectory 
are recorded every picosecond. From each set of atomistic 
coordinates, an unrestrained trajectory was run for 150 
ps forwards and backwards in time with the initial atom- 
istic velocity vector drawn from the Maxwell-Boltzmann 
distribution at 300 K. To decide whether a given unre- 
strained trajectory terminates in either an extended or 
collapsed configuration of the chain, we employ an order 
parameter based on the number of chain atoms that are 
within 1.25 A of another chain atom to which it is not 
directly bonded. If the number of non-bonded contacts 
exceeds three, the chain is considered to be in a collapsed 
configuration; otherwise it is considered extended. This 
order parameter need only distinguish between collapsed 
and extended configurations of the chain; it need not be 
(and in fact is not) a good reaction coordinate. Fig. [H 
illustrates representative forward and backwards unre- 
strained MD trajectories. 

The committor function at configuration 22 is 0.3 ±0.1, 
suggesting that the calculated free energy barrier is 
slightly biased towards the basin of stability for the col- 
lapse chain. However, this deviation from the ideal value 
of 0.5 is only marginally statistically significant, and we 
emphasize that the committor function is exponentially 
sensitive in the region of the dynamical bottleneck. To 
illustrate this point, we repeat the evaluation of the com- 
mittor function at configuration 20, which is seen in Fig. 
[2] to be near the barrier peak but slightly closer to the 
collapsed configurations of the chain, as well as at config- 
uration 24, which is slightly closer to the extended con- 
figurations. We find the committor function at configu- 
ration 20 to be 0.00 ±0.05 and the value at configuration 
24 to be 0.90 ± 0.05. Only very minor shifts along the 
MFEP dramatically change the value of the committor 
function. 

Given the extreme sensitivity of the commitor function 
near the dynamical bottleneck, the fact that configura- 
tion 22 gives rise to a significant fraction of trajectories 
that proceed to both the collapsed and extended basins 
is very significant. Furthermore, the direction in which 
the committor function changes as a function of the con- 
figuration number is as is expected in the vicinity of the 
dynamical bottleneck. We conclude that the barrier in 
the calculated free energy profile correctly characterizes 
the true free energy barrier - thus identifying the rate- 
limiting step for the collapse dynamics. 

The committor function calculations, which are based 
on straightforward MD simulations, indicate that our 
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FIG. 4: Snapshots from unrestrained MD trajectories that 
are initiated from the calculated free energy barrier at con- 
figuration 22. The instantaneous atomistic configurations are 
visualized in the collective variable representation, using the 
same technique that was introduced in Figs. [T]A and B. 

choice of collective variables provides a reasonable de- 
scription of the mechanism of hydrophobic collapse. 
This result yields atomistic support for the strategy of 
coarse-graining solvent dynamics. Using (i) that it is 
straightforward^ to obtain the stochastic dynamics in 
the collective variables for which the most likely reaction 
pathway is the same as the mechanism obtained using the 
string method and (ii) that the mechanism for hydropho- 
bic collapse obtained in the employed collective variables 
agrees with the mechanism obtained using atomistic dy- 
namics, we obtain a proof of principle that atomistic sol- 
vent dynamics can be usefully projected onto a coarse- 
grained field. 

The coarse-grained dynamics obtained via this argu- 
ment are^ 

AT 

^^2F^Y.^ll\z{t))r^,{t), (13) 

where irii{t) is a white noise satisfying {r]i{t)r]j{t)) = 
Sij5{t — t') and t is an artificial time that is scaled by 
the friction coefficient 7. The computational feasibil- 
ity of directly integrating these coarse-grained dynam- 



ics, however, hinges on the cost of calculating the mean 
force elements dFiz^/dzj and the tensor elements Mij{z) 
and dMij{z)/dzj^ since these terms are required at ev- 
ery coarse-grained timestep. It is thus encouraging that 
the free energy surface in the collective variables can be 
quantitatively modeled in lieu of atomistic simulations, 
by using Eqs. [TTland[T2l Similar approximations for the 
tensor elements might also be possible. 



D. The rate-limiting step 

Using the calculated MFEP and committor function 
values, we have established that the rate-limiting step for 
the hydrophobic collapse of the hydrated chain coincides 
with a collective solvent motion. Using a simple analysis 
of the solvation free energy, we find that this collective 
solvent motion is consistent with lengthscale-dependent 
dewetting. However, it remains to be shown whether 
the rate-limiting step involves performing work in the 
solvent, or in the chain, degrees of freedom. This is an 
important distinction. If the latter case is true, then 
dewetting merely accompanies hydrophobic collapse as a 
spectator. But if the former case is true, then dewetting 
is the rate-limiting step to hydrophobic collapse. 

To address this issue, we again decompose the free en- 
ergy profile, this time into contributions from work per- 
formed along the solvent and the chain collective vari- 
ables. The definition of the free energy profile in Eq. [10] 
can be written more explicitly as 

F%a) = r V,F(7r{a'))-d^,{a') 
Jo 

+ E r^P.F{z%a'))dP^{a') (14) 

where Vci^(z*(a')) is the vector of mean forces acting 
on the chain atom positions at configuration s = A^d x ol 
along the MFEP, and V p^F(z* {a')) is the corresponding 
mean force on the solvent collective variable Pk. The full 
free energy profile is obtained by setting the volume V 
equal to the entire simulation box. But to understand the 
role of solvent in hydrophobic collapse, it is informative 
to calculate the free energy profile using various smaller 
solvent volumes. 

The bottom curve in Fig. [5] is obtained from Eq. [Mlbv 
letting V be the empty set, thus eliminating the second 
term. The middle curve is obtained by letting V be the 
set of 8 X 8 X 8 lattice cells at the middle of the simulation 
box, as is indicated in the corresponding image, and the 
top curve is obtained by letting V be the middle set of 
20 X 20 X 20 lattice cells. Consideration of larger sets of 
lattice cells does not further alter the free energy profile, 
as is shown in Supporting Information Sec. D. The sol- 
vent variables for regions of space that are distant from 
the collapsing chain remain constant along the path and 
do not contribute to the free energy profile. 
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FIG. 5: The free energy profile from Fig. [2] is separated into 
contributions from the chain and solvent coordinates. In the 
bottom curve, only the first term in Eq. 1141 which corresponds 
to work performed in the chain coordinates, is included in the 
free energy profile. In the middle and top curves, solvent 
contributions are also included. The solvent volume included 
for each curve is indicated with a wire box. Inclusion of larger 
solvent volumes does not substantially change the profile. 



The bottom curve in Fig. [5] only shows the work per- 
formed on the chain atom positions during hydrophobic 
collapse. Remarkably, it lacks almost any free energy 
barrier, indicating that essentially no work is performed 
in the chain degrees of freedom in crossing the dynami- 
cal bottleneck. Instead, we see that the barrier emerges 
only upon inclusion of the work performed in the sol- 
vent collective variables. Performing the hydrophobic 
collapse involves traversing a free energy barrier that ex- 
ists only in the solvent coordinates. This figure shows 
that the dewetting transition not only accompanies the 
rate-limiting step for hydrophobic collapse, but that it is 
the rate-limiting step. 
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SUPPORTING INFORMATION 



A. Details of the system 

We consider a hydrophobic chain in exphcit solvent. 
The unbranched chain is composed of A^s = 12 spherical 
atoms of diameter 7.2 A and mass 70.73 amu. The inter- 
actions between the chain atoms were treated with the 
purely repulsive WCA potential {a = 6.4145A, e = 15 
kJ/mol), which provides a slightly softened hard-sphere 
interaction that is convenient for molecular dynamics 
(MD) simulations. Neighboring atoms are linked by har- 
monic bonds, using 



ond 



iVs-1 ^ 



(15) 



k=l 



where |r/e+i — r/e| is the distance between chain atoms 
at positions and r/e+i, l^ = 7.2 A, and /cb = 84.9102 
kJ/mol/A^. To add some rigidity to the chain, a har- 
monic potential is included to penalize its curvature. 



iVs-i 



angle 



(16) 



k=2 



where (pk is the angle between neighboring bond vec- 
tors (r/e+i - Tk) and(r/e - r/e_i), and = 11.1407 
k J / mol / radian^ . 

The chain was solvated with 33, 912 water molecules in 
an orthorhombic simulation cell with periodic boundary 
conditions and dimensions of 99.5 Ax 99.5 Ax 116.1 A. 
Interactions among water molecules are described with 
the SPC/E rigid water potential^ and the chain atoms 
interact with the oxygen atoms in the water molecules via 
WCA repulsions^^^'"^^^ (a = 4.617 A, e = 10 kJ/mol). Ah 
MD simulations were performed at 300 K using the Nose- 
Hoover thermostat and a timestep of 2 fs.^ Constant pres- 
sure conditions were maintained in the simulations using 
the liquid-vapor coexistence technique described in the 
text. To ensure that the liquid- vapor interface remains 
flat, as opposed to collapsing under the surface tension 
of the liquid, the solvent density restraint potential in 
Eq. m was applied to the top two layers of lattice cells in 
the simulation box with parameters = and = 100 
kJ/mol/molecule^. All MD calculations were performed 
using a modified version of the DL_P0LY_3 molecular 
simulation package.^ 



B. String method in collective variables 

The string method^i^ii^is a technique for calculating 
the committor function for a dynamical process. The 
constant-value contours of the committor function are 
approximated by a collection of hyperplanes that are per- 
pendicular to a given path (the string) . This approxima- 
tion can be justified within the framework of transition 



path theory (TPT),^^ which yields a variational criterion 
for the string such that the perpendicular hyperplanes 
optimally approximate the isocommittor surfaces. 

The string method in collective variables characterizes 
the committor function projected onto the space of col- 
lective variables. Provided that the collective variables 
adequately describe the reaction and that the reactive 
trajectories projected onto the space of collective vari- 
ables remain confined to a relativelly narrow tube, the 
approximation is not only optimal but also accurate. In 
this case, it can be shown that the string coincides with 
the minimum free energy path (MFEP) in the space of 
collective variables, which is also the path of maximum 
likelihood for the recation monitored in these variables. 

Unlike other free-energy mapping techniques, the 
string method focuses only on a linear subset of the space 
of collective variables. It thus scales independently of 
the dimensionality of the full free energy surface. The 
method does not require performance of long, reactive 
MD trajectories. Instead, a double-ended approach is 
employed in which the path is updated using short, re- 
strained MD simulations. 

A complete discussion of the implementation of the 
string method in collective variables can be found in Ref. 
10. We represent the string in the variables of the chain 
atom positions and the solvent cell densities, as is ex- 
plained in the primary text. Data needed for the calcu- 
lation of the minimum free energy path were obtained 
from restrained MD simulations. 



dF{z) 

dzi 



= \im 1^ {zi - Zi{x.))p^^z{^)dyi (17) 



and 



Mij{z)= lim 



where 



(18) 



/>^,,(x) = exp(-/5/7^,,(x))/Z,,,, (19) 

Z^^, = [ exp(-^/7,,,(x))c^x, (20) 
M 

and [/,,,(x) = F(x) + ^^(z,-z,(x))^ (21) 

i=l 

The string is converged to the MFEP using Eqs. [8] 
and [9l Between timesteps of these dynamics, the string 
is smoothed and reparameterized. The smoothing proce- 
dure is performed using 

z^'* = (1 - r])7r + 1(2^^"^ + z^+^), (22) 

where z^'* and z^ indicate the m*^ configuration of the 
smoothed and unsmoothed string, respectively. The pa- 
rameter 77 = 0.05 was chosen to be sufficiently small 
(0(A^^^), as is discussed in Ref. 10) that the smoothing 
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procedure does not effect the accuracy of the calculated 
MFEP. Reparameterization of the string was performed 
using the linear interpolation scheme described in Ref. 

The string was initialized from configurations of an 
MD trajectory in which the hydrated chain was artifi- 
cially extended from a collapsed configuration with the 
aid of a greatly magnified force constant in the chain 
potential energy term Single- In the first stage of the 
string calculation, the string was discretized using 20 con- 
figurations and evolved using large timesteps and weak 
collective variable restraints. Specifically, we employed 
chain atom restraint force constants of 2 kJ/mol/A^, sol- 
vent restraints of 10 kJ/mol/molecule^, and a steepest 
descent timestep of 300 fs. Restrained MD trajectories of 
10 ps were performed during this stage. This first stage 
of the string calculation included nine steepest descent 
timesteps. 

In a second stage of the string calculation, the num- 
ber of images used to represent the path was increased 
to 40, the chain atom restraints were increased to 5 
kJ/mol/A^, the solvent restraints were increased to 40 
kJ/mol/molecule^, the steepest descent timestep was de- 
creased to 40 fs, and the restrained MD trajectory time 
was increased to 20 ps. The second stage of the string 
calculation was terminated after six optimization steps, 
at which point reasonable convergence, as determined 
by monitoring changes in the path and its correspond- 
ing free energy profile, was obtained. Throughout both 
stages of the string calculation, the path endpoint image 
corresponding to the extended configuration was (after 
local relaxation) held fixed, and the other endpoint cor- 
responding to the collapsed chain was allowed to relax 
on the free energy surface according to Eq. [9l The final, 
converged string bears little resemblance to the initial 
guess. 

The string method is a local, rather than global, opti- 
mization scheme. Different minimized free energy paths 
might have been obtained from string calculations started 
with different initial paths. However, the simple topol- 
ogy of the chain, as well as the unrestrained MD simula- 
tions presented in the paper, suggest that the calculated 
MFEP is a reasonable characterization of the reaction 
mechanism for the hydrophobic collapse of the hydrated 
chain. 

All computations were performed in parallel using 2.2 
GHz AMD Operton processors. Each step in the first 
of the stage of the string calculation required 600 CPU 
hours. Each step in the second stage required 2400 CPU 
hours. Each evaluation of the committor function de- 
scribed in Sec. [IIIC] required 15,000 CPU hours. 



C. Alternative models for the solvation free energy 

Even with the aide of a one-parameter fit, the solvation 
free energy profiles estimated on the basis of only solvent- 
depleted surface area or only solvent-depleted volume do 




5 10 15 20 25 30 35 40 

250 I ' ' ' ' ' ' ' 1 




5 10 15 20 25 30 35 40 

Reaction Coordinate Index 

FIG. 6: (Supporting Information) Alternative models for the 
solvation free energy. The profile obtained from simulation is 
shown as a solid line, and that obtained using Eq. [T2]is shown 
as a dotted line. In Panel A, the dot-dashed line indicates 
the solvation free energy profile obtained from consideration 
of only the chain surface area, using Eq. 1231 In Panel B, 
the dot-dashed line indicates the solvation free energy profile 
obtained from consideration of only the chain volume, using 
Eq. [21 



not match the accuracy obtained using Eq. [12] in the 
text. In Fig. [6]A, we see the profile (dot-dashed line) 
obtained by fitting 7 in the expression 

Fh(a) =7^tot(c^). (23) 

The height of the shoulder at configurations 14-18 and 
the barrier peak region do not reproduce the simulated 
results (solid line) as well as Eq. [12] (dotted line). As 
is seen Fig. [6p, the solvation free energy profile (dot- 
dashed) that is obtained using the expression 

F,,{a) = ~XVtot{a) (24) 

is less accurate than that obtained using either Eqs. [T2] 
or [23] 
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D. Convergence of the free energy profile with 
increasing solvent box sizes 
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FIG. 7: (Supporting Information) Free energy profiles for the 
MFEP, calculated using Eq. [14] with V equal to the middle 
b X b X b lattice cells of the simulation box. 



We show that the free energy profile calculated using 
Eq. [m converges with respect to the solvent contribu- 
tion. The curves in Fig. [71 correspond to free energy 
profiles obtained with V equal to the middle b x b x b 
lattice cells of the simulation box. The profile changes 
dramatically with b until V fully encompasses the region 
of the bending chain, where the dewetting transition oc- 
curs. For b > 16, no major changes are seen in the profile, 
since the added contributions are from solvent cells that 
are distant from the collapsing chain. The small changes 
that are found with large b are due to statistical noise. 
This noise increases with larger b since the number of 
included solvent cells increases as b^. 
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